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Abstract 

Background: Sorghum belongs to the tribe of the Andropogoneae that includes potential biofuel crops like 
switchgrass, Miscanthus and successful biofuel crops like corn and sugarcane. However, from a genomics point of 
view sorghum has compared to these other species a simpler genome because it lacks the additional rounds of 
whole genome duplication events. Therefore, it has become possible to generate a high-quality genome sequence. 
Furthermore, cultivars exists that rival sugarcane in levels of stem sugar so that a genetic approach can be used to 
investigate which genes are differentially expressed to achieve high levels of stem sugar. 

Results: Here, we characterized the small RNA component of the transcriptome from grain and sweet sorghum 
stems, and from F2 plants derived from their cross that segregated for sugar content and flowering time. We 
found that variation in miR172 and miR395 expression correlated with flowering time whereas variation in miR169 
expression correlated with sugar content in stems. Interestingly, genotypic differences in the ratio of miR395 to 
miR395* were identified, with miR395* species expressed as abundantly as miR395 in sweet sorghum but not in 
grain sorghum. Finally, we provided experimental evidence for previously annotated mlRNAs detecting the 
expression of 25 miRNA families from the 27 known and discovered 9 new miRNAs candidates in the sorghum 
genome. 

Conclusions: Sequencing the small RNA component of sorghum stem tissue provides us with experimental 
evidence for previously predicted microRNAs in the sorghum genome and microRNAs with a potential role in stem 
sugar accumulation and flowering time. 




Genomics 



Background 

Small RNAs (18-25 nt) regulate many developmental 
and physiological processes in plants through the regula- 
tion of gene expression at either the transcriptional or 
post-transcriptional level [1-3]. They can be subdivided 
into short-interfering RNAs (siRNAs) and microRNAs 
(miRNAs) [3-5]. 

MicroRNAs are derived from capped and polyadeny- 
lated primary (pri)-miRNA transcripts that are tran- 
scribed by RNA polymerase II and can form a hairpin- 
loop structure by intramolecular pairing [4,6]. Two 
sequential cleavages mediated by DICER LIKE 1 (DCLl) 
are required to produce a mature miRNA [4,7]. In the 
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first cleavage, DCLl cleaves near the base of the hair- 
pin-loop stem of the pri-miRNA to produce a miRNA 
precursor (pre-miRNA). The second cleavage takes 
place near the loop of the pre-miRNA to produce a 
miRNA/miRNA* duplex. The mature miRNA is then 
loaded into the RNA-induced silencing complex (RISC) 
and can guide the sequence-specific cleavage or transla- 
tional inhibition of target mRNAs [2,4,7,8], as well as 
gene silencing through DNA methylation [9,10], whereas 
the non-incorporated miRNA* strand is usually 
degraded. 

Through the use of next-generation sequencing, the 
small RNA component of the Arabidopsis and rice tran- 
scriptomes has been well characterized, more than in 
any other plant species [11,12]. This is reflected in the 
miRBase database (http://www.mirbase.org, release 16: 
September 2010), where 213 miRNAs are described for 
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Arabidopsis whereas 462 miRNAs are described for rice. 
Besides rice, the identification of miRNAs through deep 
sequencing in other grasses including maize, wheat, and 
Brachypodium have been described [13-15]. The identi- 
fication of rice, maize and wheat miRNAs from different 
tissues, developmental stages and stress-treatments 
[12,13,15-20], provides an opportunity to understand 
how miRNAs regulate the expression of genes influen- 
cing traits of agronomic importance. Currently, a trait of 
particular relevance for biofuel production is that of 
sugar accumulation in the stem of sorghum [Sorghum 
bicolor (L.) Moench] and sugarcane (Saccharum spp.), 
two closely related C4 grasses that diverged from each 
other about 8-9 million years ago [21]. 

In both species, sucrose is the main type of sugar and 
accumulates in the parenchyma tissue of the juicy stems 
[22,23]. High sucrose content is a highly desirable trait 
since the accumulated sugar can be fermented to pro- 
duce bioethanol as a source of renewable energy [24]. 
Although sugarcane has been extensively used as a 
source of biofuel, its use as a model system to under- 
stand the genetics of sugar accumulation is hampered 
by its complex genome, with several cultivars differing 
greatly in their ploidy levels [25]. Sorghum instead, is a 
diploid species and its genome has been recently 
sequenced [26]. In addition, the intra-species variation 
for sugar content is much more pronounced in sorghum 
than in sugarcane [27], with sorghum cultivars known as 
sweet sorghums accumulating high levels of sugars rela- 
tive to grain sorghums [28]. This makes sorghum a 
more suitable system to study the genetic basis of sugar 
accumulation. Still, the gene repertoire involved in sugar 
accumulation is not well characterized in sorghum due 
to the low heritability of the trait and its quantitative 
inheritance. In addition, previous reports have suggested 
the existence of trade-offs between sugar content and 
other plant traits such as flowering time [28,29]. 

We also observed that sugar accumulation (measured as 
Brix degree and referred herein as Brix) in the stem of 
grain sorghum BTx623 and sweet sorghum Rio cultivars 
differed at the time of flowering. Interestingly, 80% of the 
differentially expressed genes in stem tissue between the 
two cultivars had orthologous counterparts in syntenic 
positions in rice [30,31]. This suggested that the ability of 
sorghum to accumulate soluble sugars relative to rice 
could not be explained by differences in their gene content 
but rather due to gene regulation at either the transcrip- 
tional or post-transcriptional level. To address the latter 
possibility, we characterized the small RNA portion of 
transcriptomes derived from stem tissues of grain and 
sweet sorghum in order to investigate the microRNA- 
mediated regulation of genes involved in sugar accumula- 
tion and flowering time. Using the SOLID next generation 
sequencing system, we sequenced with an unprecedented 



depth small RNAs libraries from BTx623 and Rio, and 
from a pool of selected F2 plants derived from their cross 
that differed in sugar content and flowering time. We also 
reasoned that plant stems would provide us with a repre- 
sentative tissue to experimentally validate the previously 
predicted miRNAs of the sorghum genome [26]. Indeed, 
we not only detected the expression of 25 miRNA families 
from the 27 predicted families in the sorghum genome but 
also discovered 9 new miRNA candidates. Furthermore, 
we could correlate genotypic variation of miRNA expres- 
sion with the sugar and flowering phenotypes. In addition, 
we found that the size distribution of small RNAs in sor- 
ghum stems was quite heterogeneous, characterized by 
RNAs with at least 25 nt in length that were mainly 
derived from ribosomal and transfer RNAs not annotated 
in the sorghum genome. 

Results 

Deep-sequencing of small RNAs from grain and sweet 
sorghum stems 

We constructed five small RNAs libraries from sorghum 
stem tissue at the time of flowering and sequenced 
them using the SOLID platform. The libraries comprised 
samples from BTx623, Rio, low Brix and early flowering 
F2 plants (LB/EF F2s), high Brix and late flowering F2 
plants (HB/LF F2s), and a "mixed library" (Mix), where 
small RNAs from the previous four libraries were mixed 
in equal proportions (Figure 1). 

We obtained a total of 38,336,769 sequence reads, 
from which 23,008,945 (60%) matched perfectly to the 
BTx623 reference genome (Table 1). The reads with 
perfect matches that derived from repeats constituted 74 
to 77% of the total reads depending on the library 
(Figure 2a). The non-redundant set of reads comprised 
2,539,403 sequences, and the reads that were sequenced 
only once (termed here "singlets") comprised 2,167,946 
sequences, corresponding only to 9% of the perfect 
matches (Table 1), suggesting that our sequencing 
reached a high level of saturation. If we define a cluster 
as two or more reads with identical sequences, the num- 
ber of clusters found ranged from 20,056 in the BTx623 
library to 164,623 in the HB/LF F2s library (Table 1). 

Diversity in the small RNA content of sorghum stems 

The frequency and size distribution of small RNAs from 
sorghum stems revealed two interesting aspects: a peak 
of 25 nt small RNAs with similar abundance as the 24 
nt class, and a second peak of small RNAs with 22 nt 
that were more abundant than the 20 and 21 nt classes, 
respectively (Figure 2b). This finding contrasted with the 
size distribution of small RNAs described for several 
monocot species (including small RNAs from sorghum 
inflorescence), in which the most abundant small RNAs 
were 21 and 24 nt in length, with maize being the 
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Figure 1 Selection of sorghum plants and construction of 
stem-derived small RNA libraries for deep sequencing (a) Grain 
sorglium BTx623 witli low Brix and early flowering phenotype, was 
crossed with sweet sorghum Rio with high Brix and late flowering 
phenotype and an F2 population was created. A total of 553 F2 
plants were phenotyped for flowering time (measured as the total 
number of leaves at flowering) and Brix degree. Using a bulked 
segregant analysis (BSA) approach, we selected an equal number of 
F2 plants with low Brix and early flowering (LB/EF) and with high 
Brix and late flowering (HB/LF) phenotype, respectively, (b) A flow 
chart describing the procedure for small RNA library construction 
and sequencing, (c) Histograms displaying the Brix degree and 
flowering time data obtained from plants grown in the field. We 
selected 1 1 LB/EF F2s displaying Brix degree < 5 and number of 
leaves < 9, whereas the 1 1 HB/LF F2s selected displayed a Brix 
degree > 13 and number of leaves > 14. 



exception, showing a larger 22 nt peak relative to the 21 
nt peak [13]. This led to the hypothesis that the 22 nt 
class of small RNAs are specific to maize [13]. However, 
we have shown here that a 22 nt peak is also present in 
sorghum stem tissue. Furthermore, we found that a high 
proportion of the 22 nt reads were derived from 
miR172c, accounting for approximately 15% of all the 
22 nt reads in the BTx623 library (Figure 2c). Our 
results differ from the predicted length of 20 nt for 
miR172c annotated in the miRBase database. Interest- 
ingly, MIR172c is located within the third intron of the 
Sb04g037375 gene. 

The finding of small RNAs of 25 nt in length with 
such high abundance was unexpected. This prompted us 
to investigate whether they could be derived from ribo- 
somal and/or transfer RNA genes that had not yet been 
annotated in the sorghum genome. Furthermore, since 
the sequencing read length of the SOLID system at the 
time of our experiment was limited to a maximum of 25 
nucleotides, it is possible that these RNAs are longer. In 
order to address this question, we analyzed several loci 
in the genome that accumulated more than thousand 
reads (defined as 25 nt hotspots) and found indeed that 
they were derived from non-annotated rRNA and tRNA 
genes (Additional File 1, Table SI). 

In summary, we showed that the small RNA compo- 
nent from the stem transcriptome of sorghum is charac- 
terized by small RNAs of 22 nt in length that are mainly 
derived from miR172c, and by a size class of RNAs with 
at least 25 nt in length that are predominantly derived 
from rRNAs and tRNAs genes that had not been anno- 
tated in the sorghum genome. 

Genotypic variation in ttie expression of l<nown miRNAs 
between grain and sweet sorgtium correlated witti sugar 
content and flowering time in the F2 population 

The sequencing consortium of the sorghum genome 
identified 149 predicted miRNAs belonging to 27 
miRNA families [26], and we could detect the expres- 
sion of miRNA members from 25 families based on the 
following criteria: a miRNA family was considered 
expressed only if its sequencing reads were detected in 
at least three libraries and with a frequency of 10 reads 
or more for the sum of the five libraries. A list with the 
reads count for each known miRNA family is provided 
in Additional file 2, Table S2. 

The most abundantly expressed miRNA family was 
miR172 (Figure 3a), comprising almost 6% of the total 
reads with perfect match to the BTx623 genome. The 
rest of the known miRNAs had abundances below 0.5% 
(Figure 3b). When the ratio of miRNA abundances 
between the BTx623 and Rio libraries was compared to 
the ratio between the LB/EF F2s and HB/LF F2s 
libraries, we could identify miRNA families whose 
expression differences between the parents were 
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Table 1 Deep sequencing statistics of stem-derived small RNAs 



Library 


# raw sequences 


# perfect matches 


% 


# singlets 


% 


# clusters 


Non-redundant set 


% 


Mix 


4,023,513 


2,547,108 


63 


276,044 


11 


35,083 


311,127 


8 


BTx623 


2,115,266 


1,348,361 


64 


169,063 


12 


20,056 


189,119 


9 


Rio 


3,173,601 


2,180,988 


69 


234276 


11 


31,563 


265,839 


8 


LB/EF F2s 


11,974,953 


7,472,940 


62 


653,279 


9 


120,132 


773,41 1 


6 


HB/LF F2s 


1 7,049,436 


9,459,548 


55 


835,284 


9 


1 54,623 


999,907 


6 


Total 


38,336,769 


23,008,945 


60 


2,167,946 


9 


371,457 


2,539,403 


8 



inherited in the F2 plants (Figure 3c). Considering a cut- 
off level of two-fold change in miRNA expression, we 
found that miR169 and miR172 were expressed higher 
in BTx623 relative to Rio, and higher in LB/EF F2s com- 
pared to HB/LF F2s. This means that high expression of 
these miRNAs in BTx623 correlated with low Brix and 
early flowering in the F2 plants selected, and the oppo- 
site was true for miR395 (Figure 3c). Although the 
expression difference of miR160, miR164 and miR319 
between BTx623 and Rio was inherited in the F2, and 
thus of interest for further analysis, it was less than two 
fold; so we decided to focus on miR169, miR172 and 
miR395 instead. The observation that high expression of 
miR172 correlated with early flowering is consistent 
with the reported role of this miRNA in the promotion 
of flowering [32-36]. 

Although miR169 and miR395 have known roles in 
drought stress and sulphur starvation, respectively 
[37,38], our data suggested a possible function for these 
miRNAs in sugar accumulation and flowering time. 
Because the pool of F2 plants used for library construc- 
tion were selected based on both phenotypes, it was not 
possible to assign the expression inheritance pattern of 
both miRNAs to either sugar accumulation or flowering 
time alone. For this reason, additional plants from the 
same F2 population differing in sugar content but with 
similar flowering time were selected and the expression 
of a representative member from each miRNA family, 
miR169d and miR395f respectively, was quantified using 
the TaqMan assay. We found that high expression of 
miR169d in BTx623 correlated with low Brix (Figure 
3d). This suggested that high expression levels of 
miR169 might lead to a reduction in stem sugar content 
regardless of flowering time. Surprisingly, high expres- 
sion of miR395f in Rio relative to that in BTx623 did 
not correlate with sugar content in F2 plants (Figure 
3e). This might indicate that high expression of miR395 
would be required for flowering regardless of sugar con- 
tent in the stem. Consistent with the role of miR172 in 
flowering, we did not observe any difference in the 
expression of miR172a in F2 plants with the same flow- 
ering time but different Brix (Figure 3f). 

In summary, high expression of miR172 in BTx623 
correlated with early flowering in the F2, whereas the 



opposite was true for miR395, high expression of this 
miRNA in Rio correlated with late flowering in the F2 
plants selected. Regarding sugar content in the stem, 
high expression of miR169 in BTx623 correlated with 
low Brix in the F2 plants selected. 

Genotypic variation in the miR395/miR395* ratio 

We detected the expression of the miRNA* for all 
MIR395 gene copies and this was more evident in Rio 
compared to BTx623, and in some instances the abun- 
dance of miR395' was even higher than that of miR395 
such as the case of miR3951* for instance (Figure 4a). 
Indeed, when the miR395/miR395* ratio was calculated 
for each library, we found that miR395 reads were 
approximately 6 times more abundant than miR395* 
reads in the BTx623 library (Additional file 2, Table S2). 
By contrast, the abundance of miR395 relative to 
miR395'* was in equal proportions in the Rio library. 
Our data highlighted a genotypic difference in the ratio 
between miR395 and miR395*, with a switch in strand 
abundance from BTx623 to Rio (Figure 4b). 

The FRL2 and RR3 genes are novel targets of miR172 

Although our data might suggest a possible function of 
miR169 in sugar content and miR395 in flowering time, 
we could not detect any predicted target related to car- 
bohydrate metabolism and flowering time respectively 
(Additional file 3, Table S3 and Additional file 4, Figure 
SI). Thus, the expression of miR169 and miR395 target 
genes, and their correlation with Brix and flowering 
phenotypes remains to be elucidated. Regarding the 
miR172-predicted targets, we detected cleavage products 
for the genes INDETERMINATE SPIKELET 1 (IDSl) 
and an AP2 transcription factor (Additional file 3, Table 
S3; Additional file 4, Figure SI; and Additional file 5, 
Figure S2). Furthermore, when the expression of these 
two miR172 target genes was tested, we found that they 
were expressed higher in Rio compared with BTx623 as 
expected. However, we could not find a correlation 
between their expression levels with the flowering phe- 
notype in the F2 pools of plants selected (data not 
shown). 

A FRIGIDA-like 2 (FRL2) and a TYPE A RESPONSE 
REGULATOR 3 {RR3) were predicted as new targets of 
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Figure 2 Diversity in the small RNA content of sorghum stems 

(a) Mapping of small RNAs (18-25 nt) with perfect match to 
different elements of the BTx623 reference genome with the term 
"other" representing intergenic regions, (b) Frequency and size 
distribution of small RNAs reads, (c) A high proportion of 22 nt 
reads in each library are derived from miR172c locus. The small RNA 
reads derived from miR172c in sorghum stem tissue are 22 nt in 
length in contrast to the previously predicted length of 20 nt. 



miR172 with the cleavage product of FRL2 experimen- 
tally validated in this study (Additional file 5, Figure S2). 
The f7?/G/DA-related genes are a major determinant of 
natural variation in the winter-annual habit between 
Arabidopsis accessions [39,40], whereas the TYPE A 
RESPONSE REGULATOR 3 {ARR3) has a function in 
the circadian clock [41]. Although sorghum is a crop 
from semi-arid regions [26], the miR172-mediated post- 
transcriptional regulation of FRL2 might have a role in 
the adaptation of sorghum to temperate climates. Con- 
sistent with this, a role of miR172 in the regulation of 
flowering time by ambient temperature in Arabidopsis 
has been recently described [42]. 

Identification of new miRNAs 

The miRDeep pipeline [43] was adapted for de novo 
detection of miRNAs in sorghum (Additional file 6, 
Figure S3). From an original set of 223 predicted hair- 
pins in the sorghum genome, 9 met the miRNA anno- 
tation criteria previously established [44], (Table 2 and 
Additional file 7, Figure S4). All the new miRNAs have 
predicted genes as targets except miR5389 (Additional 
file 8, Figure S5). All predicted 9 miRNAs met the 
expression criteria used above for known miRNAs 
(Figure 5 and Additional file 9, Table S4). From all 
miRNAs whose expression could be detected in sor- 
ghum stems, two of them were found to be within 
introns of protein coding genes, these included 
miR172c and miR437g. 

From the newly identified miRNAs, miR5386, and 
miR5388 displayed allelic variation in expression 
between BTx623 and Rio that was inherited in the F2 
offspring (Figure 5). However, the predicted target genes 
for miR5386 did not include any transcript involved 
neither in flowering nor in carbohydrate metabolism. 
This was a similar case with miR5388, with no predicted 
targets involved in flowering but with two genes 
involved in carbohydrate metabolism as predicted tar- 
gets, encoding the beta subunit 1 and 2 of the Snf-1 
related protein kinase (SnRKl) respectively [45] (Addi- 
tional file 8, Figure S5). 

We next attempted to experimentally validate the 
miRNA-mediated cleavage of predicted targets. Potential 
miRNA target sites were scored from 0 to 8 (see Meth- 
ods), with higher scores indicating less confidence in the 
predictions. We tested 14 predicted targets with scores 
less than 4 but we could not detect the miRNA- 
mediated cleavage for any of them. A low rate in target 
validation has also been observed for newly predicted 
miRNAs in tomato, with three targets validated from 65 
predicted targets that were tested [46]. Recently, a simi- 
lar case of very low rate in target validation was 
reported for predicted targets of new miRNAs identified 
in Arabidopsis lyrata [47]. 
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Figure 3 Genotypic variations in miRNA expression, (a) The miR172 is the most abundantly expressed miRNA in sorghum stems, (b) The rest 
of the known mlRNAs were expressed at very low abundance (less that 0.5% of the total reads in the library) in stem tissue, (c) The miRNA 
abundances were used to calculate their relative fold change in expression between BTx623 and Rio, and between the LB/EF F2s and HB/LF F2s 
ibraries, respectively. Positive values in the y-axis of the graph denote fold changes in miRNA expression that are higher in BTx623 relative to Rio 
and higher in LB/EF F2s relative to HB/LF F2s libraries, respectively; the opposite is true for negative values. The expression of miR169 and 
miR172 was at least twice as high in BTx623 relative to that in Rio and this difference was inherited in the F2. The opposite was true for miR395 
expression, (d-f) Quantification of miRNA expression through Taqman Assay in pools of 10 F2 plants each with similar flowering time (10-11 
leaves) but different sugar content (Brix 3-5 vs Brix 13-16), respectively, (d) High expression of miR169d in BTx623 relative to Rio correlates with 
low Brix in the F2 independently of flowering time, (e-f) F2 plants with similar flowering time display no differences in miR395f and miR172a 
expression regardless Brix degree. 
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(a) 



GTTCCCTTCAAGCACTTC ACATG 6AGCATTATTTGTCTTGG AG AAAG CTTAATTTGATGCATTGTG AAGTGCTTGG GGG AACTC 

AGTTCCCTTC AAGCACTTCACA,bc03, 5 
AGTTCCCTTCAAG CACTTCACA, bc05,2 

GTTCCCTTCAAGCACTTC ACA, bcOl, 54 

GTTCCCTTCAAGCACTTCA,bc03,42 

GTTCCCTTCAAGCACTTCACA,bc03,117 

GTTCCCTTCAAGCACTTCACAT, bc04, 1 

GTTCCCTTCAAGCACTTCAC,bc05,2 
TCCCTTCAAG CACTTCACA, bc05, 1 

TGAAGTGCTTGGGGGAAC,bc01,10 
TG AAGTG CTTGGGG G AACTC,bc01, 1 
TG AAGTGCTTGG G G G AAC,bc02,4 
TGAAGTGCTTGGGGGAACTC,bc02,3 
TG AAG TG CTTGGG G G AAC,bc03, 18 
TGAAGTGCTTGGGGGAACTC,bc03,4 
TG AAGTG CTTGGGG GAAC,bc04,9 
TG AAGTG CTTGGGG G AACTC, bc04, 6 
TG AAGTG CTTGGGG GAAC,bc05,33 
TG AAGTG CTTGGGG GAACT,bc05,3 
TG AAGTG CTTGGGG GAACTC,bc05,5 
GAAGTGCTTGGGGGAACTC,bc04,l 
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Figure 4 miR395* is tiighly abundant in Rio. (a) Small RNA reads derived from MIR395I are depicted. The miR395l strand sequence is shown 
in red whereas the miR39SI* strand sequence is in orange color. In green and blue color are small RNA reads sequenced from BTx623 and Rio 
ibraries respectively. The designation next to the small RNA reads refer to the library (bcOl: Mix; bc02: BTx623; bc03: Rio; bc04: LB/EF F2s and 
bc05: HB/LF F2s), followed by the number of times the small RNA read was sequenced. In the BTx623 library, only reads derived from miR395 
were detected whereas in the Rio library, most of the reads where derived from miR395l* instead, (b) IVlodel depicting the genotypic variation in 
miR395/miR395* ratio where in Rio a switch towards miR395* strand production has occurred relative to BTx623. Based on miR395* high 
abundance in Rio, we postulate here the hypothesis that miR395* species could have a functional role in the regulation biological processes 
other than the sulfur metabolism previously described for miR395. 



Discussion 

Here we have described the first characterization of the 
small RNA component of the transcriptome from sor- 
ghum stems. The choice of stems as plant material is 
interesting not only because it is the tissue were 



fermentable sugars do accumulate, but it is also the 
venue for the movement of small RNA duplexes (siR- 
NAs and miRNAs) from source to sink tissues, as have 
been recently demonstrated [48,49]. Thus, one could 
expect the small RNA component of the stem to be 
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quite diverse or heterogeneous. Indeed, the unexpected 
finding of a high abundance peak of RNAs with 25 nt 
or more in length lead us to the finding of rRNA and 
tRNA genes that have not been annotated yet in the 
sorghum genome. We have also shown that the abun- 
dance of the 22 nt small RNAs in sorghum stem tissue 
was greater than the 20 and 21 nt small RNAs respec- 
tively. Our results contrast the recently proposed notion 
that the 22 nt peak of small RNAs is exclusive of maize 
[13]. Furthermore, we found that up to 15% of all the 
22 nt small RNAs in the BTx623 library were derived 
from miR172c, which has been previously predicted to 
have a length of 20 nt (Paterson et al. 2009). Recently, 
22 nt miRNAs have been described to trigger siRNA 
biogenesis from target transcripts in Arabidopsis [50,51]. 
Thus, it would be interesting to test if miR172c can also 
trigger siRNA biogenesis in sorghum. 

As expected, the specific genetic material, tissue sam- 
ple and developmental stage used in our study, allowed 
us to capture a broad spectrum of the small RNA com- 
ponent of the sorghum transcriptome. On the other 
hand, the specificity of the material also permitted us to 
gain new insights into how complex traits like sugar 
accumulation and flowering time might be regulated at 
the post-transcriptional level. Such regulation of gene 
expression could provide an opportunity to manipulate 
biofuel traits, where stem sugar rather than cellulose 
and increased biomass because of delayed flowering 
could be enhanced [52] . By taking a genetic approach in 
conjunction with deep-sequencing of stem-derived small 
RNAs, we were able to correlate variation in miRNA 
expression between grain and sweet sorghum, with the 
sugar and flowering phenotypes of selected F2 plants 
derived from their cross. However, analysis of the differ- 
ential accumulation of potential target genes did not 
exhibit a simple correlation with miRNA levels. There- 
fore, further studies will be required to unveil the 
underlying mechanisms between genotype and 
phenotype. 

In the case of miR395, it is interesting to note that 
there was genotypic variation in the miR395/miR395* 
ratio, with the Rio genotype expressing both strands at 
equal proportions in contrast to a clear predominance of 
miR395 abundance over miR395* in BTx623 (Figure 4b). 
This is reminiscent of the recently proposed "arm switch- 
ing" model of miRNA evolution described for nematodes 
species [53], in which the mature miRNA is produced 
from the 5' arm of the miRNA hairpin in a particular spe- 
cies but in a different nematode species the 5' arm of the 
same MIR gene gives rise to the miRNA* instead. Inter- 
estingly, it has been shown recently that miRNA* species 
have physiological relevance in Drosophila, since a signif- 
icant number of them are well conserved, can be loaded 
into the RISC complex through their preferential 



association with ARGONAUTE2 (AG02) rather that 
AGOl, and can also regulate the expression of target 
genes [54]. Furthermore, the regulatory potential of 
miRNA* species in vertebrates has been recently demon- 
strated as well [55]. 

Conclusions 

Based on the above, several interesting questions can be 
formulated. First, does miR395* have any regulatory 
potential? Second, what is the mechanism behind the 
genotypic difference in miR395/miR395* ratio? Third, is 
this ratio altered in a developmental and/or tissue 
dependent manner? Fourth, is this an example of a gen- 
eral phenomenon? If this is the case, we would envision 
that other miRNAs families as well will display differ- 
ences in their miRNA/miRNA* ratio dependent on the 
genotype and/or condition. Future work will be required 
to provide a better understanding of miR395's involve- 
ment in processes other than its previously described 
role in sulfur metabolism. 

Methods 

Plant material 

The grain (BTx623) and sweet (Rio) sorghum cultivars 
together with F2 plants derived from their cross were 
grown in the field of the Waksman Institute during the 
summer of 2008. The juice from three internodes of the 
main stem was harvested at the time of flowering and 
the Brix degree measured as previously described 
[30,31]. The average Brix degree from three internodes 
per plant was used. Flowering time was measured as the 
number of leaves in the main stem at the time of 
anthesis. 

In total, 15 plants for each parent and 553 F2 plants 
were scored for Brix degree and flowering time. The F2 
plants selected for sequencing had either low Brix (Brix 
< 5)/early flowering (NO leaves < 9) or high Brix (Brix > 
13)/late flowering (NO leaves > 14). 

Construction of small RNA libraries 

Total RNA from internode tissue was extracted at the time 
of flowering with the mirVana miRNA isolation kit 
(Ambion). RNA extraction was performed in 5 indepen- 
dent plants for each BTx623 and Rio, and 1 1 independent 
plants for each low Brbc/early flowering and high Brix/late 
flowering F2 plants respectively. The total RNA (1 [ig per 
sample) was pooled and then fractionated with the flash- 
Page fractionator (Ambion) to isolate RNAs smaller that 
40 nt in length. The isolated small RNAs were used to 
construct small RNA cDNA libraries with the SOLID 
small RNA library construction kit (Ambion). The sequen- 
cing was carried out at the Waksman genomics laboratory 
on the SOLID 3 platform, which has a read length limit of 
25 nt http://solid.rutgers.edu. 
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Table 2 List of new predicted MIR genes in the sorghum genome 


MIR gene 
ID 


Position 


Strand 


miRNA 
size 


miRNA sequence 5'-3' 


miRNA* sequence 5'-3' 


miRNA* 
size 


sbi-MIR5381 


Chi: 574388.574497 


+ 


19 


AAGATCTGTGGCGCCGAGC 


TCGGCGCTAAGATCTCTGG 


19 


sbi-MIR5382 


Ch2: 1930828.. 1930937 


-1- 


18 


CCAATCTAAACAGGCCCT 


GACCTGTTTAGATrGGGA 


18 


jkJI JVlll\^JOJ 


Ch4: 

43242765..43242874 




24 


ATGAfAGAGrrrrGGrAGAGATAT 


TTrTrrGrrGAGnTATCTGTGG 


23 


sbi-MIR5384 


Ch4: 

45785396..45785505 


+ 


18 


CGCGCCGCCGTCCAGCGG 


CrrGGCCGGTGCACGCGTC 


19 


sbi-MIR5385 


Ch6: 

5630751 7..56307626 


+ 


22 


ACCACCAACCCCACCGCrrCTC 


GAAGCGGTGGTGTTGGTGGTGA 


22 


sbi-MIR5386 


Ch7: 877244.877353 


+ 


20 


CGTCGCTGTCGCGCGCGCTG 


GGTCAGGGCAGAGCACGCA 


19 


sbi-MIR5387 


Ch7: 

15969322..1 5969431 


+ 


25 


TAACACGAACCGGTGCTAAAGGATC 


CCCTTTAGCACCGGTTCGTGrrACA 


25 


sbi-MIR5388 


Ch8: 1629110..1629219 


+ 


22 


ATCTTTGCCGGGTGTCrCTGAC 


CAGCAAACATTCGGCAAAGAAAA 


23 


sbi-MIR5389 


Ch8: 4848342.4848451 


+ 


21 


GOTGAGTTTATCAGCCGAGT 


ATGGCITATCAGCCAAGTGA 


20 



*AII the small RNA reads mapped to "chromosome_7_22.BC_03" were derived from the predicted miRNA* strand. 



Bioinformatic analysis 

We mapped the 25 nt long reads to the sorghum gen- 
ome using the SHRiMP program version 1.0.5 [56], with 
default parameter settings except that the number of 
matches was limited to 10. SHRiMP allowed us to per- 
form the alignments in SOLiD's colorspace. For the 
further analyses we used only alignments that matched 
perfectly to the genome starting from the first position 
in the read up to the sequencing primer. Because the 
SOLID 3 platform had a read length limit of 25 nucleo- 
tides, adaptor sequences did not have to be trimmed 
prior mapping to the genome. As a consequence, we 
could estimate the length of an individual sequence read 
by one base with a probability of 0.25. These reads were 
then clustered with Vmatch http://vmatch.de/ to reduce 
the number of identical reads for downstream analyses. 
We required 100% identity among the sequences of a 
cluster. We have further filtered the clustered reads 
against the repetitive elements of sorghum and used the 
remaining sequences for de novo prediction of miRNA 
using miRDeep. 

We defined a 25 nt "hotspot" as those loci in the gen- 
ome that displayed a high coverage of 25 nt reads, in 
our case thousand reads. The length of the hotspot was 
determined as each consecutive interrogated base that 
had more than thousand 25 nt reads mapped to it. 

Quantification of mIRNA expression 

The TaqMan MicroRNA Assays (Applied Biosystems) 
was used to quantify the expression of miR172a, and the 
Custom TaqMan Small RNA Assays (Applied Biosys- 
tems) was used to quantify the expression of miR169d 
and miR395f respectively. The qRT-PCR reaction was 
done using the MyiQ Real-Time PCR Detection System 
(BIO-RAD Laboratories, Inc.). A relative quantification 
normalized against unit mass (10 ng total RNA) was 
performed as previously described [30]. 



De novo discovery of sorghum miRNAs 

For de novo prediction of potential miRNAs, we have 
used the miRDeep package [43]. As miRDeep does not 
take colorspace alignment as input, we had to reshape 
the output to miRDeep's blastparse format. Moreover, 
the SHRiMP alignment scores and the score used had 
to be recalculated to fit miRDeep's blastparse format. 
We used the same formula and method as described 
[57]. At this point, we also had to translate the color 
space two base encoding sequences into standard 
nucleotide base space sequences. As we considered only 
perfectly matching reads after the initial alignment to 
the genome, we could easily translate from color space 
to base space sequence format. The subsequent de novo 
calling of miRNAs was carried out as described [43,57]. 

Finally, the coordinates of de novo miRNAs that were 
predicted on the minus strand were corrected as miR- 
Deep refers the coordinates to the 5' end of the minus 
strand. Though, conventionally the coordinates refer 
always to the 5' end of the plus strand. The GenBank 
accession numbers for the new miRNAs are sbi- 
MIR538.sqn sbi-MIR5381 JN205291; sbi-MIR538.sqn 
sbi-MIR5382 JN205292; sbi-MIR538.sqn sbi-MIR5383 
JN205293; sbi-MIR538.sqn sbi-MIR5384 JN205294; sbi- 
MIR538.sqn sbi-MIR5385 JN205295; sbi-MIR538.sqn 
sbi-MIR5386 JN205296; sbi-MIR538.sqn sbi-MIR5387 
JN205297; sbi-MIR538.sqn sbi-MIR5388 JN205298; sbi- 
MIR538.sqn sbi-MIR5389 JN205299. 

We have also validated all potential new miRNAs 
according to the annotation criteria proposed by [44]. 

Target prediction and validation 

We have used the novel miRNAs for a target prediction. 
Firstly, we compared the sequences to the unspliced 
transcripts of sorghum [26], with BLASTN using these 
parameters: -F F -W 7 -e 1 -q -2 -G -1. We scored each 
base of the alignment according to these criteria: match 
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Figure 5 Genotypic variations in the expression of new miRNAs. (a) The frequency count of small RNAs for each new miRNA was used to 
calculate its abundance, (b) The mlRNA abundances were used to calculate their relative fold change in expression between BTx623 and Rio, and 
between the LB/EF F2s and HB/LF F2s libraries, respectively. Positive values in the y-axis of the graph denote fold changes in mlRNA expression that 
are higher in BTx623 relative to Rio and higher in LB/EF F2s relative to HB/LF F2s libraries, respectively; the opposite is true for negative values. The 
miRNA miR5383 was not included in the graph because it was not detected in the Rio library (see Additional file 9, Table S4). 



as 0; GU pairs as 0.5; gaps as 2; all other pairs were 
scored as 1. We doubled the score within the first 13 
bases of the miRNA/ alignment. We considered the gene 
as a potential target if the total score of the alignment 
was equal to or less than 8. 

The miRNA-mediated cleavage of mRNAs was per- 
formed through a modified procedure of the RLM- 
RACE protocol from Invitrogen. The sequences of the 
reverse primers used in the modified RACE are: 
Sb01g044240 (5' GCCCATATGGACGGAAGATA 3'); 
Sb02gG07G00 (5' CTGGTAGCCGGAGAACAACT 3') 
and Sb06g03067G (5' TTTCATCAGTGCTTGCCAAT 



3'). The validation of predicted targets was performed in 
BTx623 or Rio cultivars only. Annotation of the miRNA 
gene targets was based on the Phytozome database 
http:/ /www.phytozome.net. 

Additional material 



Additional File 1: Table S1 - 25 nt hotspots in the sorghum 
genome 

Additional file 2: Frequency counts of small RNA reads for known 
microRNA families. Table S2 displays a quantitative analysis of 
microRNAs. 
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Additional file 3: Predicted targets of miR169, miR172, and miR395 

Table S3 provides a list of predicted target genes of miR169, miR172, and 
miR395. 

Additional file 4: Targets of predicted for miR169, miR172 and 
miR395 microRNAs. Figure SI displays an alignment between miR169, 
miR172 and miR395 microRNAs and their target sequences. 

Additional file 5: Mapping of miR172-guided cleavage sites in 
predicted target genes. Figure S2 displays an alignment of miR172 with 
its target sequences and cleavage sites. The locations of the mlRNA- 
cleavage sites are indicated with downward arrows and the frequency of 
the cleavages are indicated as the number of clones for each RACF 
product with respect to the total clones sequenced. 

Additional file 6: Pipeline for the de novo miRNA detection Figure 
S3 presents a diagram of computational steps involved in de novo 
mlRNA detection. All reads from SOLiD sequencing were mapped in 
colorspace to the sorghum genome using SHRilVlP. Perfect matching 
reads were clustered with Vmatch then filtered against the sorghum 
repeat sequences and compared with know sorghum mlRNAs to classify 
them. The remaining sequences were taken for de novo mlRNA 
prediction using miRDeep. 

Additional file 7: Hairpin structures of the newly discovered 
miRNAs. Figure S4 presents a collection of hairpin structures from newly 
discovered miRNAs. Sequences are depicted together with the frequency 
distribution of the small RNA reads aligned to the hairpin. The 2D hairpin 
structure produced by the miRDeep software is also shown. 

Additional file 8: Predicted targets for the newly discovered 
miRNAs in sorghum. Figure S5 presents a list of alignments between 
new discovered miRNAs and their predicted targets in sorghum. 

Additional file 9: Frequency counts of small RNA reads for new MIR 
genes. Table S4 displays a quantitative analysis of new MIR genes. 
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